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Query containment and query answering are two important computational tasks in databases. 
While query answering amounts to compute the result of a query over a database, query contain- 
ment is the problem of checking whether for every database, the result of one query is a subset of 
the result of another query. 

In this paper, we deal with unions of conjunctive queries, and we address query containment 
and query answering under Description Logic constraints. Every such constraint is essentially an 
inclusion dependencies between concepts and relations, and their expressive power is due to the 
possibility of using complex expressions, e.g., intersection and difference of relations, special forms 
of quantification, regular expressions over binary relations, in the specification of the dependencies. 
These types of constraints capture a great variety of data models, including the relational, the 
entity-relationship, and the object-oriented model, all extended with various forms of constraints, 
and also the basic features of the ontology languages used in the context of the Semantic Web. 

We present the following results on both query containment and query answering. We provide a 
method for query containment under Description Logic constraints, thus showing that the problem 
is decidable, and analyze its computational complexity. We prove that query containment is 
undecidable in the case where we allow inequalities in the right-hand side query, even for very 
simple constraints and queries. We show that query answering under Description Logic constraints 
can be reduced to query containment, and illustrate how such a reduction provides upper bound 
results with respect to both combined and data complexity. 
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1. INTRODUCTION 

Query containment and query answering are two important computational tasks in 
databases. While query answering amounts to compute the result of a query over a 
database, query containment is the problem of checking whether for every database, 
the result of one query is a subset of the result of another query 1 . Many papers 
point out that checking containment is a relevant task in several contexts, including 
information integration [Ullman 1997], query optimization [Abiteboul et al. 1995; 
Aho et al. 1979a], (materialized) view maintenance [Gupta and Mumick 1995], 
data warehousing [Widom (ed.) 1995], constraint checking [Gupta et al. 1994], and 
semantic caching [Amir et al. 2003]. 

In this paper, we deal with query containment and query answering under in- 
tegrity constraints, or simply constraints. 

The former is the problem of checking whether containment between two queries 
holds for every database satisfying a given set of constraints. This problem arises 
in those situation where one wants to check query containment relatively to a 
database schema specified with a rich data definition language. For example, in the 
case of information integration, queries are often to be compared relatively to (inter- 
schema) constraints, which are used to declaratively specify the "glue" between two 
source schemas, and between one source schema and the global schema [Calvanese 
et al. 1998; Hull 1997; Ullman 1997; Catarci and Lcnzerini 1993; Levy et al. 1995; 
Lcnzcrini 2002; Halcvy 2001]. 

The complexity of query containment in the absence of constraints has been 
studied in various settings. In [Chandra and Merlin 1977], NP-completeness has 
been established for conjunctive queries, and in [Chekuri and Rajaraman 1997] 
a multi-parameter analysis has been performed for the same case, showing that 
the intractability is due to certain types of cycles in the queries. In [Klug 1988; 
van der Meyden 1998], Il^-completeness of containment of conjunctive queries with 
inequalities was proved, and in [Sagiv and Yannakakis 1980] the case of queries 
with the union and difference operators was studied. For various classes of Datalog 
queries with inequalities, decidability and undecidability results were presented in 
[Chaudhuri and Vardi 1992; van der Meyden 1998; Bonatti 2004; Calvanese et al. 
2003], respectively. 

Query containment under constraints has also been the subject of several inves- 
tigations. For example, decidability of conjunctive query containment was inves- 
tigated in [Aho et al. 1979b] under functional and multi-valued dependencies, in 
[Johnson and Klug 1984] under functional and inclusion dependencies, in [Chan 
1992; Levy and Rousset 1996; Levy and Suciu 1997] under constraints representing 
is-a hierarchies and complex objects, and in [Dong and Su 1996] in the case of con- 
straints represented as Datalog programs. Undecidability is proved in [Calvanese 
and Rosati 2003] for recursive queries under inclusion dependencies. Several results 
on containment of XML queries under constraints expressed as DTDs are reported 
in [Ncvcn and Schwentick 2003; Wood 2003]. 

Query answering under constraints is the problem of computing the answers to a 



We refer to the set semantics of query containment. Bag semantics is studied, for example, in 
[Ioannidis and Ramakrishnan 1995]. 
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query over an incomplete database relatively to a set of constraints [van der Meyden 
1998]. Since an incomplete database is partially specified, this task amounts to 
compute the tuples that satisfy the query in every database that conforms to the 
partial specification, and satisfies the constraints, ft is well known in the database 
literature that there is a tight connection between the problems of conjunctive 
query containment and conjunctive query answering [Chandra and Merlin 1977]. 
Since this relationship holds also in the presence of constraints, most of the results 
reported above apply to query answering as well. In this paper, we concentrate 
mainly on query containment, and address query answering only in Section 5. 

In this paper 2 , we address query containment and answering in a setting where: 

- The schema is constituted by concepts (unary relations) and relations as basic 
elements, and by a set of constraints expressed in a variant of Description Logics 
[Baader et al. 2003]. Every constraint is an inclusion of the form a\ C a 2 , where ol\ 
and a 2 are complex expressions built by using intersection and difference of rela- 
tions, special forms of quantification, regular expressions over binary relation, and 
number restrictions (i.e. cardinality constraints imposing limitations on the number 
of tuples in a certain relation in which an object may appear). The constraints ex- 
press essentially inclusion dependencies between concepts and relations, and their 
expressive power is due to the possibility of using complex expressions in the speci- 
fication of the dependencies. It can be shown that our formalism is able to capture 
a great variety of data models, including the relational, the entity-relationship, and 
the object-oriented model, all extended with various forms of constraints. The rele- 
vance of the constraints dealt with in this paper is also testified by the large interest 
that the Semantic Web community expresses towards Description Logics. Indeed, 
several papers point out that ontologies play a key role in developing Semantic Web 
tools [Gruber 1993], and Description Logics are regarded as the main formalisms for 
the specification of ontologies in this context [Patel-Schneider et al. 2004]. Despite 
this interest, the results presented in this paper can be considered one of the first 
formal analysis on querying ontologies. 

- Queries are formed as disjunctions of conjunctive queries whose atoms are 
concepts and relations, and therefore can express non-recursive Datalog programs. 

- An incomplete database is specified as a set of facts asserting that a specific 
object is an instance of a concept, or that a specific tuple of objects is an instance 
of a relation. As we said before, an incomplete database V is intended to provide 
a partial specification of a database, in the sense that a database conforming to T> 
contains all facts explicitely asserted in T>, and may contain additional intances of 
concepts and relations. 

We observe that, given the form of constraints and queries allowed in 
our approach, none of the previous results can be applied to get decidabil- 
ity /undecidability of query containment and query answering in our setting. 

We present the following results on both query containment and query answering: 

(1) We provide a method for query containment under Description Logic con- 
straints, thus showing that the problem is decidable, and analyze its com- 



2 This paper is an improved and extended version of part of [Calvanese et al. 1998]. 

ACM Transactions on Computational Logic, Vol. V, No. N, February 2008. 



4 



Diego Calvanese et al. 



putational complexity. This result is obtained by adopting a novel technique 
for addressing the problem, based on translating the schema and the queries 
into a particular Propositional Dynamic Logic (PDL) formula, and then check- 
ing the unsatisfiability of the formula. The technique is justified by the fact 
that reasoning about the schema itself (without the queries) is optimally done 
within the framework of PDL [De Giacomo and Lenzerini 1996]. 

(2) We prove that query containment is undecidable in the case where we allow 
inequalities in the right-hand side query, even for very simple constraints and 
queries. 

(3) We show that query answering under Description Logic constraints can be 
reduced to query containment, and illustrate how such a reduction provides 
upper bound results with respect to both combined and data complexity. 

The paper is organized as follows. In Section 2, we present the formalism used 
to express both the constraints in the schema, and the queries. In Section 3, we 
deal with query containment. In particular, in Subsection 3.1 we describe the 
logic CPDL 9 , which will be used for deciding query containment, in Subsection 3.2 
we describe the reduction of query containment to unsatisfiability in CPDL g , in 
Subsection 3.3 we prove its correctness, and in Section 3.4 we analyze the complexity 
bounds for checking containment of queries. In Section 4, we show undecidability 
of query containment in the presence of inequalities. In Section 5, we deal with 
query answering, and in Section 6 we conclude the paper. 

2. SCHEMAS AND QUERIES IN VOZ reg 

To specify database schemas and queries, we use the logical language VCTZ reg , 
inspired by [Catarci and Lenzerini 1993; Calvanese et al. 1995], belonging to the 
family of (expressive) Description Logics [Calvanese et al. 2001; Baader et al. 2003]. 
The language is based on the relational model, in the sense that a schema S de- 
scribes the properties of a set of relations, while a query for S denotes a relation 
that is supposed to be computed from any database conforming to S. A schema is 
specified in terms of a set of assertions on relations, which express the constraints 
that must be satisfied by every conforming database. 

2.1 Schemas 

The basic elements of T>£lZ reg are concepts (unary relations), n-ary relations, and 
regular expressions built over projections of relations on two of their components. 3 . 

We assume to deal with a finite set of atomic concepts and relations, denoted 
by A and P respectively. We use C to denote arbitrary concepts, R to denote 
arbitrary relations (of given arity between 2 and n max ), and E to denote regular 
expressions, respectively built according to the following syntax 

C::=Ti | A | -.C | d n C 2 | 3E.C | 3[$*]R | (< k [$i]R) 
R::=T„ | P | ($i/n:C) -.R | R x n R 2 
E ::= e | Rjj^jj | E\ o E 2 \ E\ U E 2 \ E* 

3 We could include in the logic also domains, i.e. sets of values such as integer, string, etc.. However, 
for the sake of simplicity, wc do not consider this aspect in this work. 
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where i and j denote components of relations, i.e., integers between 1 and n max , n 
denotes the arity of a relation, i.e., an integer between 2 and n max , and k denotes 
a nonnegative integer. 

Expressions of the form (< k [$i]R) are called number restrictions. In what 
follows, we abbreviate -3E.^C with ME.C, and (%i/n:C) with ($i : C) when n 
is clear from the context. Also, we consider only concepts and relations that are 
well-typed, which means that 

— only relations of the same arity n are combined to form expressions of type 
Ri n R2 (which inherit the arity n), and 

— i < n whenever i denotes a component of a relation of arity n. 

A T>£lZ reg schema is constituted by a finite set of assertions, of the form 

Ri E R2 

where Ri and R2 are of the same arity. Note that our notion of schema corresponds 
to that of TBox in Description Logics [Baader et al. 2003]. 

The semantics of VCR reg is specified through the notion of interpretation. An 
interpretation 1 = (A x , - x ) of a V£lZ reg schema S and a set C (of constants to be 
used in queries) is constituted by an interpretation domain A x and an interpretation 
function x that assigns 

— to each constant c in C an element c x of A x under the unique name assumption, 

— to each concept C a subset C x of A x , 

— to each relation R of arity n a subset R x of (A 1 )", 

— to each regular expression E a subset E 1 of A x x A x 

such that the conditions in Figure 1 are satisfied. We observe that Ti denotes 
the interpretation domain, while T„, for n > 1, does not denote the n-Cartesian 
product of the domain, but only a subset of it, that covers all relations of arity n. It 
follows from this property that the "-1" constructor on relations is used to express 
difference of relations, rather than complement. 

An interpretation 1 satisfies an assertion C\ CJ C2 (resp., Ri C R 2 ) if Cf C C x 
(resp., R x C Rf). An interpretation that satisfies all assertions in a schema S is 
called a model of S. It is easy to see that a model of a schema S actually corre- 
sponds to a database conforming to S, i.e., a database satisfying all the constraints 
represented by S. A schema is satisfiable if it admits a model. A schema S logically 
implies an inclusion assertion C\ C C 2 (resp. Ri CR 2 ) if for every model X of S 
we have that Cf C Cf (resp. R x C R x ). 

It can be shown that T>£lZ reg is able to capture a great variety of data models with 
many forms of constraints. For example, we obtain the entity-relationship model 
(including is-a relations on both entities and relations) in a straightforward way 
[Calvanese et al. 1995], and an object-oriented data model (extended with several 
types of constraints), by restricting the use of existential and universal quantifica- 
tions in concept expressions, by restricting the attention to binary relations, and 
by eliminating negation, disjunction and regular expressions. Compared with the 
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T x = A 1 

A x C A 1 

(-C) 1 = A 1 \ C x 

(C 1 nC 2 ) x = C x nCf 

(3-E.C*) 1 = {d G A 1 3d' G C x .{d,d') G E 1 } 

(3[$i]R) z = {d £ A z 3(di,. . .,<i„) £ R z .di = d} 

(<fe[$j]R) x = {d£ A 1 | #{(di,...,d„) G R? |di = d}<fe} 

T x n C (A*)" 

($i/n:C) T = {(di,...,d„) 6 T; I d< e C 1 } 

OR) 1 = Tj\R* 

(Ri n R2) 1 = Rf nRf 

e 1 = {0,2) a; G A 1 } 

(H-lsi.Sj) 1 = {Oi.^j) I (xi,...,x„) e R 1 } 

(£1 E 2 ) x = E x o E x 

(£1 U E 2 ) x = E X UE X 

(E*) x = (E X Y 



Fig. 1. Semantic rules for VCIZreg (P, R, Ri, and R2 have arity n) 



relational model, the following observations point out the kinds of constraints that 
can be expressed using VCR reg . 

— Assertions directly express a special case of typed inclusion dependencies, namely 
the one where no projection of relations is used. 

— Unary inclusion dependencies are easily expressible by means of the 3[$2]P con- 
struct. For example, 3[$2]Pi C 3[$3]P2 is a unary inclusion dependency between 
attribute 2 of Pi and attribute 3 of P2. 

— Existence and exclusion dependencies are expressible by means of 3 and -1, re- 
spectively, whereas a limited form of functional dependencies can be expressed 
by means of (< 1 [$i]R). For example, Ti C (< 1 [%i]P) specifies that attribute 
i functionally determines all other attributes of P. 

— The possibility of constructing complex expressions provides a special form of 
view definition. Indeed, the two assertions P C R, R □ P (where R is a 
complex expression) is a view definition for P. Notably, views can be freely 
used in assertions (even with cyclic references), and, therefore, all the above 
discussed constraints can be imposed not only on atomic relations, but also on 
views. These features make our logic particularly suited for expressing inter- 
schema relationships in the context of information integration [Calvanese et al. 
1998], where it is crucial to be able to state that a certain concept of a schema 
corresponds (by means of inclusion or equivalence) to a view in another schema. 

— Finally, regular expressions can be profitably used to represent in the schema 
inductively defined structures such as sequences and lists, imposing complex con- 
ditions on them. 

One of the distinguishing features of V£lZ reg is that it is equipped with a method 
for checking logical implication. Indeed, VCK reg shares EXPTIME-completcncss 
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of schema satisfiability and logical implication with many expressive Description 
Logics [Calvanese et al. 2001; Baader et al. 2003] (see below). 

We point out that T>CR, reg supports only special forms of functional and inclusion 
dependencies. Hence the undecidability result of implication for (general) functional 
and inclusion dependencies taken together, shown in [Mitchell 1983; Chandra and 
Vardi 1985], does not apply. 

2.2 Queries 

A query q for a VCR reg schema is a non-recursive Datalog query, written in the 
form: 

g(x) <- conj^x, yi,ci) V • • • V conj m (x, y m , c m ) 

where each conj^x, yi, Cj) is a conjunction of atoms, and x, y*j (resp. Cj) are all the 
variables (resp. constants) appearing in the conjunction. Each atom has one of the 
forms C(t) or R(t), where 

— t and t are constants or variables in x, y, , Cj 

— C and R are respectively concepts and relations expressions over S. 

The number of variables of x is called the arity of q, i.e., the arity of the relation 
denoted by the query q. 

We observe that the atoms in the queries are arbitrary T>CR, reg concepts and re- 
lations, freely used in the assertions of the schema. This distinguishes our approach 
with respect to [Donini et al. 1998; Levy and Rousset 1996], where no constraints 
can be expressed in the schema on the relations that appear in the queries. 

Given an interpretation X of a schema S, a query q for S of arity n is interpreted as 
the set q x of n-tuples (oi , . . . , o n ), with each o, £ A 1 , such that, when substituting 
each Oi for Xi, the formula 

3yi • conj ! (x, yi , ci ) V • • • V 3y m . conj m (x, y m , c m ) 

evaluates to true in X. 

If q and q' are two queries (of the same arity) for S, we say that q is contained in 
q' wrt 5, denoted S \= q C q' , if q 1 C q a for every model X of S. Given a V£lZ reg 
schema S and two queries for S 

q(x) <- conj^x, yi, Ci) V • • • V conj m (x, y TO , c m ) 
qr'(x) <- conji(x,yi,ci) V • • • V conj^, (x, y^,, c^,) 

we have that <S |= g C q' iff there is no model X oi S such that, when substituting 
suitable objects in A 1 for x, yi, . . . y m , the formula 

(cory'^x, yi, ci) V • • • V conj m (5t, y m , c m )) A 

-i3zi. conj ^(x, zi,^) A • • • A ->3z m > .conj ' m , (x, z m > , c' m ,) 

evaluates to true in X. In other words, S |= q C if and only if there is no model 
of <S that makes the formula 

(conj^a, bi,ci) V • • • V conj m (a, b m , c TO ))A 
-.3zi.conj'i(a, zi,ci) A • • • A -i3z m -. conj ^, (a, z m -,c m /) 
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Company 



MainDept 



Fig. 2. The entity-relationship diagram for the example in Section 2.3 

true, where a, bi, . . . , b m are Skolem constants, i.e., constants not appearing else- 
where for which the unique name assumption does not hold. 

Query containment is the problem of checking whether S \= q C q', where 5, q, 
and q' are given as input. Query satisfiability is the problem of checking whether 
a given query is interpreted as a non-empty set in at least one model of a given 
schema. 

2.3 Example 

Consider an application where the departments of a given company can be con- 
trolled by other departments, and sold to companies. Every department is con- 
trolled by at most one department, and by at least one main department, possibly 
indirectly. A main department is not controlled by any department. If a main 
department is sold, then all the departments controlled by it are also sold. Finally, 
if a department is sold, then all the department that, directly or indirectly, controls 
it are also sold. 

The basic concepts and relations are shown in Figure 2 in the form of an entity- 
relationship diagram. The specification of the application in V£lZ reg makes use of 
the concepts Dept, MainDept, Money, Company, and the relations CONTROLS, SOLD. 
In particular, C0NTR0LS(x, y) means that department x has control over department 
y, and S0LD(x, y, z) means that department x has been sold to company y at price z. 
The schema S is constituted by the following assertions: 

SOLD C ($1 : Dept) n ($2 : Company) n ($3 : Money) 
CONTROLS C ($1 : Dept) n ($2 : Dept) 

Dept C (< 1 [$2]C0NTR0LS) n 3(C0NTR0LS| $2j $i)*.MainDept 
MainDept C Dept n S [$2] CONTROLS 
MainDept n 3[$1]S0LD C V(C0NTR0LS| $li$2 )*-3[$l]S0LD 

Dept n 3[$1]S0LD C 3(C0NTR0LS| $2 ,$i)*-(MainDept n 3[$1]S0LD) 

The first two assertions are used to specify the types of the attributes of the re- 
lations. The third and the fourth assertions specify the basic properties of Dept 
and MainDept. It is easy to see that such assertions imply that, in all the models 
of S, the set of CONTROLS links starting from an instance m of MainDept form a 
tree (which we call CONTROLS-tree) with root m. The role of the transitive closure 
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(C0NTR0LS|$ 2 .$i)* an d the number restrictions is crucial for correctly representing 
the above property in the schema. Finally, the last two assertions, each one stating 
inclusions between views, specify the company policy for selling departments. Note 
again the use of the transitive closure for this purpose. 

We now consider two queries for the schema S. The first query, called q is used to 
retrieve all the pairs of departments that are controlled by the same department and 
that comprise at least one sold department. The second query, called q' , retrieves 
all the pairs (x, y) of departments such that x has been sold, and y belongs to the 
same CONTROLS-tree of x. The queries q and q' are defined as follows: 

q(x) <- CONTROLS^, y) A SDLD(y, z\, z 2 ) 
q'(x) <- Dept(x) A S0LD(x, Z\, z 2 ) 

One can verify that S \= q C q' . Indeed, the schema S imposes that (i) the 
CONTROLS relation is typed, so that x in q is a department; (ii) when a department 
is sold, there is a main department (possibly indirectly) controlling it that is also 
sold, and when a main department is sold, all the departments it (directly and 
indirectly) controls are sold as well. 

Also, if we add to q(x) the condition that department x is not sold, we obtain 
the query 

q"(x) <- C0NTR0LS(x, y) A S0LD(y, zi, z 2 ) A -nS0LD(x, w\,w 2 ) 
which is unsatisfiable. 

3. CHECKING QUERY CONTAINMENT 

We address the problem of deciding, given a schema S and two queries q and q' of 
the same arity, whether S \= q C q'. To do so, we make use of a reduction of query 
containment to a problem of unsatisfiability in a variant of Propositional Dynamic 
Logic, called CPDL 9 . In the next subsection, we introduce CPDL g . Then, we present 
the reduction, prove its correctness, and analyze the computational complexity of 
the resulting containment algorithm. 

3.1 The Propositional Dynamic Logic CPDL g 

Propositional Dynamic Logics are specific modal logics originally proposed as a 
formal system for reasoning about computer program schemas [Fischer and Ladncr 
1979]. Since then PDLs have been studied extensively and extended in several ways 
(see e.g., [Kozen and Tiuryn 1990] for a survey). 

Here, we make use of CPDL g (studied in [De Giacomo and Lenzerini 1996] in the 
context of description logics), which is an extension of Converse PDL [Kozen and 
Tiuryn 1990] with graded modalities [Fattorosi-Barnaba and De Caro 1985]. The 
syntax of CPDL S is as follows (A denotes an atomic formula, 4> an arbitrary formula, 
p an atomic program, and r an arbitrary program) : 

(/)::= A | | 0i A 02 | (r)<f> | [p]<fc</> | [p~]<fc</> 
rv~p | ri,r 2 r 1 U r 2 | r* \ <j)l \ r~ 

We use the standard abbreviations, namely T for true, F for false, V for disjunction, 
=> for material implication, and [r]<p for -i(r)-i0. 
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Fig. 3. Semantic rules for CPDL 9 



As usual for PDLs, the semantics of CPDL g is based on Kripke structures M = 
(S,- M ), where S is a set of states and - M is a mapping interpreting formulae 
as subsets of 5 and programs as binary relations over S. The semantics of each 
construct is shown in Figure 3. 

It can be shown that CPDL g has typical properties of PDLs, in particular the 
connected-model property (if a formula has a model, then it has one that is con- 
nected when viewing it as a graph), the tree-model property (if a formula has a 
model, then it has one that is a tree when viewing it as an undirected graph), and 
EXPTIME-completeness of checking satisfiability of a formula (with the assump- 
tion that numbers in graded modalities are represented in unary) [De Giacomo and 
Lenzerini 1996; Calvanese et al. 2001; Baader et al. 2003]. 

3.2 Reduction of Query Containment to Unsatisfiability in CPDL S 

Our aim is to reduce query containment to a problem of unsatisfiability in CPDL S . 
To this end, we construct a CPDL g formula starting from an instance of the query 
containment problem. More precisely, if we have to check whether there is no model 
of S that makes the formula 

(conj^a, bi,ci) V • • • V conj m (a, b m ,c m )) A 

^3zi . conj[ (a, zi , ci ) A • • • A -Gz m / . conj' m , (a, z m < , c m > ) 

true, where a, bi, . . . , b m are Skolem constants, we check the unsatisfiability of the 
CPDL g formula 

m m' 
$S^ q C q > =^SA{\/ ®con 3j ) A (f\ -^conjj) A $a«z, 

constructed as described below. 
$5: encoding of <S 

$5 is the translation of S into a CPDL ff formula, that is based on reification of 
n-ary relations, i.e., a tuple in a model of S is represented in a model of &s¥=qc q ' 
by a state having one functional link /, for each tuple component $i. $5 makes 
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a(A) 

cr(C'inC 2 ) 
a(3E.C) 
<r(3[$i]R) 
a((< fe[$j]R)) 



Ti 
A 

Ti A-.<r(C) 
<t(Ci) A<t(C 2 ) 
(a(E))a(C) 

</fMR) 
[/-]<^(R) 



<r(T„) 

CT (P) 

<r({i/n:C)) 
<r(^R) 
o-(Ri n R 2 ) 

^(-Rlsi.si) 

ct(Ei o£ 2 ) 
cr(£i U£ 2 ) 
<r(J3*) 



T n 
P 

T„ A [fi]a(C) 
T n A -.<t(R) 
<t(Ri) A ct(R 2 ) 

fr;<rW,fi 
a(E 1 );a(E 2 ) 
o(Ei)\Jcr{E2) 
a{E)* 



Fig. 4. Mapping ct(-) from T>CTZ reg to CPDL g 

use of the mapping er(-) from T>ClZ reg expressions to CPDL S formulae defined in 
Figure 4. The atomic formula Ti denotes those states that represent objects, while 
each atomic formula T„, with n > 2, denotes those states that represent tuples of 
arity n. We denote with U the program (createU f\U ■ ■ -U/„ moa . U create ~ U/-f U- • -U 
/~ )*, where create, /i, . . . , ,fn max are all atomic programs used in ^s^ijCq'- Due 
to the connected-model property of CPDL g , [/ represents the universal accessibility 
relation. Therefore, for a given interpretation, [U]<f> expresses that (f> holds in every 
state, and (U)<fi expresses that </> holds in some state. 
$5 is the conjunction of the following formulae: 



[tf](TiV---VT nrao „ 
[^][/i]<iT 
[U](T n ee (/i)Ti A 
[^]([/i]F =► [/i+i]F) 
[C/](A Ti) 
[U](P =► T„) 

[C/]( ( r(R 1 )^a(R 2 )) 



) (1) 

for each i G {1, . . . , n max } (2) 

• ' A (/„)Ti A [/„+i]F) for each n e {2, . . . 

} (3) 

for each i G {1, . . . , n„ laa; } (4) 

for each atomic concept A (5) 

for each atomic relation P of arity n (6) 

for each assertion C\ C Ci in S* (7) 

for each assertion Ri C R 2 in S (8) 



The formula (1) above expresses that each state represents an object or a tuple 
of arity between 2 and n max . The formula (2) expresses that all programs fi 
are functional (i.e., deterministic). The formulae (3) and (4) express that the 
states representing tuples of arity n are exactly those connected through programs 
fi, . . . , /„ to states representing objects, and not connected via programs fi, with 
i > n, to any state. The formulae (5) and (6) express that states satisfying atomic 
propositions corresponding to atomic concepts (resp. atomic relations of arity n) 
are states representing objects (resp. tuples of arity n). Finally, the formulae (7) 
and (8) encode the assertions in S. 



encoding of each conj ■ (a, bj c, ■) 



For each j G {1, . . . ,m}, the encoding <fr con j. of conj '• 3 -(a, bj Cj ■) makes use of spe- 
cial atomic propositions, called name-formulae whose distinguishing properties are 
specified by $> a ux (see later). Specifically, one name- formula N t is introduced for 
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each term t in a, bj, Cj, and one name- formula JVj for each tuple t such that for 
some R, R(t) appears in conjj (a, bj , Cj ) . A name- formula assigns a name to a 
term i (resp. tuple t), which allows for identifying in a model certain states which 
correspond to t (resp. reified counterpart of t). The distinguishing properties of 
name-formulae guarantee that these states share some crucial properties that allow 
us to isolate a single state as a representative of t (resp. t). 

Once we have name-formulae in place, we define ^>conj j as the conjunction of the 
following formulae: 

(1) for each name-formula N% corresponding to a tuple t = (ti, . . . ,i n ) appearing 
in conjj (a, bj,Cj) 

\U](N { = (h)N tl A • • • A (f n )N tn A [/ n+1 ]F) 

[U](N ti =► ((fr)Ns A [/r]<i^t)) for each i€{l n} 

(2) for each atom C(t) in conjj (a, bj , Cj ) 

[f/](iV t tr(CO) 

(3) for each atom R(t) in conj'j (a, bj, Cj) 

[U](N { =► cr(R)) 

Intuitively, 3> c(m j. expresses the relationships between terms and tuples in 

conj ■ (a, bj , Cj ) by using reification and name- formulae. In particular, the formu- 
lae (1) relate the name- formulae corresponding to tuples to the name- formulae 
corresponding to their components. Each formula (1) and (2) expresses that the 
states satisfying the name-formula corresponding to a term (resp. tuple) appearing 
in an atom, satisfy also the formula corresponding to the predicate of the atom. 

&conj' - encoding of each 3zj.conj'j(si,Zj,Cj) 

Now consider a j G {1, . . . , to'}. We construct the formula ^conj^ as a disjunction of 
formulae, one for each possible partition of the variables Zj in 3zj . con] 'j (a, zj , Cj ) . 
More precisely, to build one such formula, we consider a partition ir of the variables 
Zj . Then, for each equivalence class in the partition we choose a variable as a repre- 
sentative, and substitute in 3zj .conj'j(&, Zj, cj) all other variables in the same equiv- 
alence class by the representative, thus obtaining a formula 3w w . conj'j (a, w ff , c). 
Now, from such a formula we build a corresponding CPDL 9 formula by making use 
of a special graph, called tuple-graph, which intuitively reflects the dependencies 
between variables and tuples resulting from the appearance of the variables in the 
atoms of 3w T . conj'j (a, w ff , c) 4 . A tuple-graph is a directed graph with nodes la- 
beled by CPDLg formulae and edges labeled by CPDL g programs, formed as follows: 

— There is one node t for each term t in a, w, Cj, and one node t for each tuple 
t such that R(t) appears in 3w w . conj'j (a, w ff , c). Each node t is labeled by all 

4 The tuple-graph is similar to the graph used in [Chekuri and Rajaraman 1997] to detect cyclic 
dependencies between variables. 

ACM Transactions on Computational Logic, Vol. V, No. N, February 2008. 



Conjunctive Query Containment and Answering under Description Logics Constraints • 13 



<j{C) such that C{t) appears in 3w w . conj'j (a, w,r , c) . Each node t is labeled by 
all cr(R) such that R(t) appears in 3w w . conj' (a, w„- , c) . 

— There is one edge labeled by fi from the node t = (ii, . . . ,t n ) to the node ti, 
i G {1, . . . ,n}, for each tuple t such that R(t) appears in 3w T .co?7//j(a, w^, c). 

Notice that dividing the variables Zj in 3zj . conj'j (a, Zj , Cj ) in all possible ways 
into equivalence classes and replacing equivalent variables by one representative, 
corresponds to introducing in all possible ways equalities between variables. Such 
equalities allow us to take into account that a cycle in the tuple graph can in fact 
be eliminated, and become simply a chain, when different variables are assigned 
the same object. As will become clear in the following, the distinction between 
variables appearing in cycles in the tuple-graph and those that do not, is indeed 
necessary for the correctness of the proposed technique for query containment under 
constraints. 

In the following, we call formula-template a CPDL g formula in which formula- 
placeholders occur that later will be substituted by actual formulae. From the tuple- 
graph of 3w w .conjj(a, c) we build a CPDL 9 formula-template 5* , and to do so 
we have to consider that in general the tuple-graph is composed of several connected 
components. For the i-th connected component we build a formula-template 5f by 
choosing a starting node t (corresponding to a term) and performing a depth- 
first visit of the corresponding component and building the formula in a postorder 
fashion. We describe the construction by defining a visiting function V, which, 
given a node of the tuple-graph, returns the corresponding formula-template, and 
as a side effect marks the nodes of the graph that it visits. 

— If u = t, then V(t) marks t, and returns the conjunction of: 

(i) t itself, used as a placeholder, and every formula labeling the node t; 

(ii) for each edge (t,i) labeled by fi (i.e., t — ti in t) such that t is not marked 
yet, the formula (f^)V(t). 

— If u = t = (ti, . . . , t n ), then V(t) marks t, and returns the conjunction of: 

(i) t itself, used as a placeholder, and every formula labeling the node t; 

(ii) for each edge (t, t,) labeled by fi, such that ti is not marked yet, the formula 

(fi)V(u); 

(iii) for each edge (t, ti) labeled by fi, such that U is already marked, the formula 

(h)u. 

Then the formula-template 5? for the i-th connected component is defined as V(to), 
where to is the starting node chosen for the visit. 

The formula-template 8^ for the whole tuple-graph of 3w T . conj'j (a, w w , c), com- 
posed of I > 1 connected components, is 

(U)5l A • • • A {U)5J 

where S^, . . . ,SJ are the formula-templates corresponding to all the connected com- 
ponents in the tuple-graph of 3w T . conj'j (a, w„c). 

Now we are ready to define the CPDL S formula ip n corresponding to a partition 
7r of the variables Zj. The formula (p^ consists of the disjunction of all formulae 
obtained by replacing in the formula-template 5* 
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(i) each placeholder t by T„ where n is the arity of the tuple t; 

(ii) each placeholder d in a, Cj by the name- formula Nd; 

(iii) each placeholder wi corresponding to a variable not occurring in a cycle in the 
tuple- graph by Ti; 

(iv) each placeholder uii corresponding to a variable occurring in a cycle in the 
tuple-graph by each of the name-formulae N t corresponding to a term in a, 
bi, . . . , b m , ci, . . . , c m occurring in q or to a term in £[,..., c' m , occurring in 

Observe that the number of such disjuncts in ip n is 0(^ 2 ), where l\ is the number of 
variables and constants in q plus the number of constants in q', and £' 2 is the number 
of variables Wi occurring in a cycle in the tuple-graph for 3w v .conj'j(a., w v , c). 

Since ip v corresponds to one possible partition of the variables Zj , we obtain the 
formula <& C onj'. as the disjunction of all formulae <p n , one for each possible partition 
7r of the variables Zj. The number of such disjuncts is 0(2 e2 ), where £2 is the 
number of variables Zj . 

Therefore, the total number of disjuncts for § con j>. is 0(£^^ 2 ^). 

$aux'- encoding of constants and variables 

Let $' = $ 5 A (V™ ! $ co , y .) A (A^! ^conjr), and let JVi , . . . , N K be all name- 
formulae in $ aux is formed by the conjunction of: 

— the formula (create)Ni A • • • A (create)NK which expresses the existence of a state 
satisfying a name-formula N i: for each i £ {1, . . . , K}; 

— one formula of the form [U](N Ci ->iV c . ) for each pair of distinct constants Cj, 

Cj appearing in the queries (not Skolem constants); 
— one formula of the form [U](Ni A (f> =>• [U](Ni </>)) for each name-formula Ni, 

i G {1, . . . , K}, and each formula </> such that 5 : 

(a) 4, G CL($'), 

(b) 4> = (¥)</>' with (r)<f>' G CL($'). and 

(c) <j) = (r';p)Nj with r' G Pre(r), p = f \ f~, and r, /, Nj occurring in CL($') 
where r is defined inductively as follows: 

p = p: (Ai-dVj)? 
ri;r 2 = n";?^ 
ri U r-2 = rT U ¥2 
r\ = n* 

J? = 0? 

The role of <fr a ux is to enforce that, in every model of &s\L q cq>, for each Nk, 
one representative state can be singled out among those satisfying Nk- This would 
be trivially obtained if we could force all these states to satisfy exactly the same 
formulae of the logic. & a ux forces a weaker condition, namely that these states 

5 CL(<f>) is the Fishcr-Ladner closure of a CPDLg formula <f>, and Pre(r) is the set of "prefixes" of 
a program r [De Giacomo and Lenzerini 1996]. 
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®s\£qc q > is true in 

N$ Ap is true in 

N ai is true in 

N ao is true in 




s, 



Fig. 5. A model of <f>s^ q c q > 



satisfy the same formulae in the finite set (whose size is polynomial with respect to 
<&') described above. Theorem 3.4 shows that this is sufficient for our purposes. 

We illustrate the encoding of the containment problem S |= q C q' into unsatis- 
fiability of the CPDL S formula &s^qc q > by means of the following example. 

Example 3.1. Consider two queries 



over a schema S such that S ty= q C q'. Figure 5 schematically shows a model of the 
formula &s^qCq< that represents a counterexample to the containment. Indeed, the 
model contains a state in which p holds, that, being connected to s ai and s a2 by 
means of /i and f 2 , respectively, represents a tuple (01,02) that satisfies p. Since 
s ai and s a2 satisfy N ai and N a2 , respectively, and $ con j> = [U](N ai => [f~](r=> 
[f2]{-'N a2 V [/3] F))) is true in s root , it follows that s 0l satisfies [fi]{r^>[f 2 ]^N a2 ). 
Therefore, in the model there is no state satisfying r representing a tuple (ai, 02, z). 

3.3 Correctness of the Reduction 

By exploiting the properties of the encoding $s^ 9 c 9 ' , we can now prove decidability 
of query containment in our case. 

We say that a tuple-graph g is satisfied in an interpretation Ai for &s^ q c q > if 
there exists an homomorphism 77 mapping the nodes of g to states of M. such that: 

— if a node u of g is a (possibly Skolem) constant, then rj(u) € N^ 4 ; 

— if a node u of g is labeled by a formula cf), then rj(u) G 4> M ; 

— if an edge (u, u') of g is labeled by a program / then (rj(u), f]{u')) G f M . 

Given a formula-template <fi and a substitution 6* of its placeholders, we denote 
by (j)9 the formula obtained from <j> by substituting the placeholders according to 9. 
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Lemma 3.2. Let g be a connected component of a tuple-graph, S the correspond- 
ing formula-template, and M a CPDL 9 interpretation. If there exists a substitution 
9 of the placeholders such that (59) M is not empty, then g is satisfied in M. 

Proof. If (89) M is not empty then it is possible to define an homomorphism 
as follows. Let s t (resp. sj) be the state of M. that is used in satisfying 59 in the 
position corresponding to t (resp. t), then r/(t) — s t (resp. r/(t) = sj). □ 

Theorem 3.3. Let S be a schema, q, q' two queries of the same arity, and 
&S^qC q ' the formula obtained as specified above. If S ^= q C q' then &s^qC q ' is 
satisfiable. 

Proof. It suffices to consider a model X = (A x , • x ) of S that makes the following 
formula true: 

(conj^a, bi,ci) V • • • V conj m (a, b m ,c m )) A 
-.3zi.coTy'i(a, zi,ci) A • • • A Sz m , .conj' m ,{a,z m , ,c m >) 

From 1 build a reified CPDL ff interpretation M = (5, - M ) for &s^qCq' as follows: 

-S = A x u { Sroot } u U„ e{ 2,...,n mox }{ s t I * e t x } ; 

—for each n G {2, . . . ,n TOax }, for each (ti, ...,t n ) G T x , we have S( tl ,...,t n ) G T^ 1 , 
and (s( tl ,... ltn ),ti) G J^ 4 with i G {1, . . . ,n}; 

— for each atomic relation P, for each (ii, . . . ,t n ) G P x , we have S( tli . .. ;tn ) G P-^ 
-T^ 4 = A x and for each atomic concept A, we have A M — A 1 ; 

— for each (possibly Skolcm) constant t occurring in conj ! (a, bi, Ci) V • • • V 
conj m (a, h m , c m ) we have N^ 4 = {t}; similarly for each tuple t of (Skolem) con- 
stants occurring in conj 1 (a, bi, Ci) V • • • V conj m (a, h m , c m ) we have N^ 4 = {sj}; 

— for each (possibly Skolem) constant t occurring in conj ! (a, bi, Ci) V • • • V 
conj m (a,b m ,c m ) we have (s root ,t) G create M ; similarly for each tuple t of 
(Skolcm) constants occurring in conj 1 (a, bi, Ci) V • • • V conj m (a, b m , c m ) we have 
(sroot, «t) e create M . 

Next we show that M is a model of the formula $s^qCq'- It is immediate to 
verify that 

(1) s root G (by construction, considering that X is a model of S); 

(2) s root G Q^nj.i for some j G {l,...,m} (by construction, considering that X 
satisfies conj 1 (a, bi, ci) V • • • V conj m (a, b m , c TO )); 

(3) s root G (by construction, considering that name-formulae are interpreted 
as singletons in M). 

It remains to show that s root ^Im'' ' f° r each j G {1, ...,m'}. Suppose not, 

A 

that is, suppose that s root G $^ nj / , for some j G {1, . . . , m'}. Then there exists 

a partition it of the variables Zj in 3zj.conj'j(a,Zj,Cj) such that s r oot G y x . This 
in turn implies that there is a substitution 9 of the placeholders in the formula- 
template (U)6f A • • • A (U)SJ such that s root G {(U)d? A • • • A {U)5J)6 M . But then 
we have s root G (((U}6^)9 A • • • A {{U)5J)d) M , i.e., for each i G {1, . . . ,£}, there 
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is a state G (5J6) M . By Lemma 3.2 this implies that, for each i G {1, . . . ,£}, 
the connected component gi corresponding to r5f of the tuple-graph is satisfied in 
M., and hence, by construction, the corresponding part of 3z j . conj j (a, Zj , Cj ; ) is 
satisfied in I. Since this is true for all connected components, we get that the 
whole 3zj.conj'j(a,Zj,Cj) is satisfied in 1, contradicting the fact that 2 makes 
~^3zj . conjj (a, zj , £j ) true. □ 

We say that a model of <&s^qc q > is tuple-admissible if there is no pair of states 
that represent the same reified tuple. We say that a model of ^s\^ q <z q ' is admissible 
if it is tuple-admissible and each name-formula is true in exactly one state. We 
say that a model M — (S, - M ) of $s^<?c<j' is a pseudo-tree admissible model if it is 
admissible and has the following form: 

— it has a distinguished state s roo t, and K not necessarily distinct states 
SMn ■ ■ ■ , sn k1 one for each name-formula A/j, such that = {s^r}; 

—create M = {{s root ,s Ni ) \ i G {1, . . . ,K}}; 

— each maximal connected component of M \ ({s root } U {sjy i i G {1, . . . , K}}) is 
a tree, when viewed as an undirected graph. 

Notice that, the subgraph induced by M PI {s^ i G {1, . . . , K}} is an arbitrary 
graph, instead. 

The following theorem shows that, w.r.t. satisfiability, one can restrict the atten- 
tion to pseudo-tree admissible models. 

Theorem 3.4. Let S be a schema, q, q' two queries of the same arity, and 
^S^qCq' the formula obtained as specified above. If &s^qc q ' * s satisfiable then it 
has a pseudo-tree admissible model. 

PROOF. By the tree-model property, &s^qCq' admits a tree-model M = (S,- M ), 
in which obviously there is no pair of states that represent the same reified tuple. 
Let s root & ^^Lqdq' ^ e tnc root °^ We transform M. into a new model M' = 
{S',- M ') with 5" C S, which interprets name-formulae as singletons and is still 
tuple-admissible, as follows. For each Ni, i G {1, . . . ,K}, we select a state SN t , 
among the states s G Nf 4 such that (s roo t, s) G create M . Then we define: 

{(s r oot,SNi) e create M \ i G {1, . . . , K}} 

(P M \ ({(s Ni ,s) G p M | s G Nf*,i, j G {1, . . . , K}} U 
{(*, s Nj ) e p M | a e N^,i,j G {1, . . . , K}})) 
u{(s Nz7 s N] ) | (s Ni ,s)e P M ,seNf A ,i,j e{i,...,K}} 

for each atomic program p except create 

{sjv 4 } for each name- formula JVj, i G {1, . . . , X} 

n 5' for each atomic formula A except name formulae 

{sroot} U {s G 5 | (w, a) G create^' o ({J(p M ' U (p-)^ 1 '))*} 

p 

It is possible to show, by using the construction in Lemma 5 of [De Giacomo and 
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Lenzerini 1996] 6 ,that for each <f> G CL(&s^qC q >) and for each state s <E S' 

s^4> M ' if and onlyifsG^ M . 

Hence, since $sfiq<z g ' G CL($ s ^ q c q >) and s root G ($s^ q Cq>) M , we get the the- 
sis. □ 

For pseudo-tree admissible models one can prove the "converse" of Lemma 3.2. 

Lemma 3.5. Let g be a tuple-graph, 4> the corresponding formula-template, and 
M. a CPDLg pseudo-tree admissible model of &s^qCq> ■ Then we have: if there exists 
an homomorphism rj from g to M. such that nodes corresponding to variables in g 
are mapped either to states representing (possibly Skolem) constants or to distinct 
states, then there exists a substitution 9 of the placeholders in (f> such that (<p9) M 
is not empty. 

Proof. We first observe that, since M is a pseudo-tree admissible model, and r/ 
assigns all variables in g not assigned to states representing (Skolem) constants, to 
distinct states, we have that, if a variable w occurs in a cycle in g, then the state 
i](w) assigned to w must be one representing a (possibly Skolem) constant. 

Hence we can define 9 as the substitution that: 

— replaces each placeholder that corresponds to a variable w occurring on a cycle in 
g, and thus such that r](w) is a (possibly Skolem) constant, with a name formula 
N d . 

— replaces each placeholder that corresponds to a variable w not occurring on a 
cycle in g, with Ti. 

It is easy to verify that, with 9 defined in this way (4>9) M is not empty. □ 

Theorem 3.6. Let S be a schema, q, q' two queries of the same arity, and 
&S^qC q ' the formula obtained as specified above. If &s^qc q ' has a pseudo-tree 
admissible model then S ^ q C q' . 

Proof. We show how to construct from a pseudo-tree admissible model M of 
$S^ 9 c<j' a model X of S in which there is a tuple a of objects such that a G q x and 
a ^ q a . X is built as follows: 

— A 1 = Tf 4 ; 

— P 1 = {(si,..., s n ) | 3a' G P M .((s', Si ) G ff 4 , fori G {l,...,n})}, for each 
atomic relation P of arity n; 

— A 1 — A , for each atomic concept A; 

— t 1 = s G , for each constant and Skolem constant t in q and q' . 
To show that X does the job, we have to show that: 

6 The construction in [Dc Giacomo and Lenzerini 1996] is phrased in the Description Logic CJQ, 
and it is used to reduce ABox reasoning to satisfiability. CJQ and CPDL g can be seen as a syntactic 
variant one of the other, and our handling of constants, through name-formulae, in CPDL 9 is closely 
related to handling ABoxcs in CXQ, the only difference is that for constants in the ABoxes the 
unique name assumption is made, while here we do not make such an assumption. However, the 
unique name assumption plays no role in the construction of [De Giacomo and Lenzerini 1996], 
hence that construction works in our case as well. 
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(1) X is a model of S; 

(2) conj ! (a, bi, Ci) V ••• V conj m (a, b TO ,c TO ) is true in X, i.e., there is one j G 
{1, . . . , m} such that conjj(&, bj, Cj) is true in X; 

(3) 3zi.conj' 1 (a, zi, ci) A • • • A 3z m i.conj' 1Jl ,(a,z m >,c m >) is true in X, i.e., for each 
j G {1, . . . , to'}, we have that 3zj.conj'j(a, Zj,Cj) is true in X. 

To show that X is a model of S we can exploit the fact that M. = (5, - 1 ) is a 
model of $ 5 and that, since it is admissible, there is no pair of states in S that 
represent the same reified tuple. By construction of X it is easy to see that all 
assertions in S are true in X. 

To show that there is one j G { 1 , . . . , to} such that conj ■ (a, b j , Cj ) is true in X, we 
exploit that M is an admissible model of &s^ q c q '- Hence there is a j G {1, . . . , to} 
such that M. is an admissible model of ^ C onj j , and since each name- formula is true 
in exactly one state, the claim easily follows. 

It remains to show that for each j G {1, . . . , to'}, we have that 3z j. con j'j (a, Zj,Cj) 
is true in X. We show that, if for some substitution o = (oi,...,o n ) for the 
variables Zj = (zi, . . . , z n ) we have that conj'j(a, o, cj) is true in X, then, for some 
j 1 G {1, . . . , to'}, we get a contradiction to M. is a model of -i$> C onj'. ■ 

By considering which variables have been assigned to the same objects in o, we 
get a partition of the variables in Zj . Corresponding to such a partition n we have 
considered in the construction of & CO nj'. the formula 3w n . conj'^ -(a, w ff , c), obtained 
by replacing all variables in the same equivalence class by a representative. Observe 
that, as a result, distinct variables in w T are assigned distinct objects in o. 

Let now tp„ be the disjunct in $ con j>. obtained from 3w w . conj'j (a, w,r, c). ip^ is 
a disjunction of formulae, all obtained by replacing in the same formula-template 
(U)5i A • • • A (U)SJ the placeholders corresponding to the variables w either by Ti 
or by name-formulae corresponding to constants or Skolcm constants. 

Let gi be the tuple-graph obtained from 3w w .conjj(a, w„ c). Then, using the 
assignment above we can define an homomorphism 77, mapping the nodes of gi to 
states of M, such that nodes corresponding to variables in gi are either mapped 
to states representing (possibly Skolem) constants or mapped to distinct states. 
Hence, we can apply Lemma 3.5, and conclude that there exists a substitution 9 of 
the corresponding formula-template (C/)5f A • • • A (U)5J such that {{U)8l)9 A • • • A 
{(U)8J)9 is true in A4. This implies that one of the disjuncts in ip„ is true in M. 
and hence that ^ C onj'. is false in M. Thus we get a contradiction. □ 

The following theorem, which is a consequence of Theorems 3.3, 3.4 and 3.6, 
shows decidability of query containment under constraints in our setting. 

Theorem 3.7. Let S be a schema, q, q' two queries of the same arity, and 
$S^qCq' the formula obtained as specified above. Then S \/= q C q' if and only if 
$S¥=qCq' is satisfiable. 

3.4 Complexity of Query Containment 

We analyze now the computational complexity of our algorithm for query contain- 
ment. 
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Theorem 3.8. Let S be a schema and q and q' two queries. Then deciding 
whether S \= q C q 1 can be done in time 2 p d' s l + l 9 l + l' J A 2 ) ; where \S\, \q\, and \q'\ 
are respectively the sizes of S, q, and q' , t\ is the sum of the number of variables 
in q and the number of constants in q and q' , and £2 is the number of existentially 
quantified variables in q' . 

Proof. Soundness and completeness of the encoding of query containment S \= 
q C q' into unsatisfiability of &s^qc q ' follow from Theorem 3.7. With regard 
to complexity, since satisfiability in CPDL 5 is EXPTIME-complcte, it follows that 
query containment can be done in time 2 P ^ S ^^"'^ . It is easy to verify that 

l$s^c 9 '| = 0(|5| + M + | g '|-^ 2) )- □ 

The previous theorem provides, for query containment S \= q C q', a single expo- 
nential upper bound in the size of S and of q, and a double exponential upper bound 
in the size of q' (note that |g'| is an upper bound for The single exponential 
upper bound in the size of S and of q is tight. Indeed, it follows from EXPTIME- 
hardness of satisfiability in CPDL g (in fact plain PDL [Fischer and Ladncr 1979]) 
and from the fact that any CPDL g formula can be expressed as a T>CTL reg concept. 
EXPTIME-hardness in S holds even in the case where S does not contain regular 
expressions. Indeed, the formulae used in the EXPTIME-hardness proof of satisfia- 
bility in PDL [Fischer and Ladner 1979], can be expressed as assertions in T>£lZ reg 
not involving regular expressions. It is still open whether the double-exponential 
upper bound in the size of q' is tight. 

The double exponential upper bound in the size of q' is due to the exponential 
blowup in the size of &s^ q c q '- By analyzing the reduction presented in Section 3.2, 
one can observe that such an exponential blowup is only due to those existentially 
quantified variables in q' that appear inside a cycle in the tuple-graph for q' . Hence, 
when the tuple-graph for q' docs not contain cycles, we have that l^s^c^' | = 
0(\S\ + \q\ + \q'\), and query containment can be checked in time 2 p ^ s ^ + ^ + ^ q D. 
A relevant case when this occurs is when (the tuple-graph for) the query on the 
right-hand side has the structure of a tree. 

Corollary 3.9. Let S be a schema, q and q' two queries of the same arity, and 
let q' have the structure of a tree. Then deciding whether S \= q C q' can be done 
m ^me 2P(l <s l+l«l+l9'D. 

Observe that this gives us an EXPTIME-complctencss result for containment of 
an arbitrary query in a tree-structured one wrt a schema. 

Query satisfiability can be considered as a special case of query containment. 
Indeed, given a schema S, a query q is satisfiable wrt S if and only if it is not 
contained in the empty query wrt S. The empty query can be expressed, for 
example, as it(x) <— P(x) A ->P(x), where x is a tuple of variables and P is a new 
atomic relation, both of the same arity as q. 

Corollary 3.10. Let S be a schema, and q a query. Then deciding whether q 
is satisfiable wrt S can be done in time 2 p ^ s ^ + ^\ 

Again, this result shows EXPTIME-complcteness of query satisfiability wrt a 
schema. 
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4. UNDECIDABILITY OF CONTAINMENT OF QUERIES WITH INEQUALITIES 

In this section we show that, if we allow for inequalities inside the queries, then 
query containment wrt a schema becomes undccidable. The proof of undccidability 
exploits a reduction from the unbounded tiling problem [van Emde Boas 1997]. 
An instance T = (V, H, V) of the tiling problem is defined by a finite set V of 
tile types, a horizontal adjacency relation H G T> x £>, and a vertical adjacency 
relation V G T> x T>, and consists in determining whether there exists a tiling of the 
first quadrant of the integer plane with tiles of type in V such that the adjacency 
conditions are satisfied. As shown in [Harel 1985; van Emde Boas 1997], the tiling 
problem is well suited to show undccidability of variants of modal and dynamic 
logics, and the difficult part of the proof usually consists in enforcing that the tiles 
lie on an integer grid. To this end we exploit a query containing one inequality. 

Formally, given an instance T = (V, H, V) of the tiling problem, a T -tiling is a 
total function i:NxN — > V, and such a tiling is correct if (t(i, j), t(i + G H 
and (t(i,j),t(i,j + 1)) G V, for each i, j G N. We reduce the problem of checking 
whether there exists a correct T-tiling to the problem of checking whether St \= 
go C gj, for suitable schema St and queries qo and q' Q containing inequalities. 

Consider an instance T = (V, H, V) of the tiling problem with tile types 
V = {£>!,..., .Dfe}. We construct a schema St using the atomic concepts Tile, 
D\, . . . , Dk and two binary atomic relations Right and Up as follows: 



Tile C D x U • • • U D k (9) 

Di C Tile for each i G {1, ... , fc} (10) 

Di C -iDj for each i, j G {1, . . . , k}, i < j (11) 

Tile C (< 1 [$l]Right) n (< 1 [$1] Up) (12) 

Tile C 3[$l}(Right n ($2 : Tile)) n 3[$l](£/p n ($2 : Ti/e)) (13) 

A E (U ( z^ )e H -3[$l](ffi. 9 M n ($2 : ^))) n (14) 



(U (fl , A . )e ^3[$l](Ppn($2:ni} J ))) for each z G {1,...,*} 
The define the boolean queries qo and as follows: 
goQ I«e(x) 

<?o() <- Right(x, y) A C/p(y, z) A j/') A Right(y', z') A z ^ z' 

Theorem 4.1. Lef T &e an instance of the tiling problem, St a schema, and 
qo and q' Q two queries defined as specified above. Then there is a correct T-tiling if 
and only if St qo Q Qo ■ 

Proof. "=>" Let t be a correct T-tiling. We construct an interpretation I t of 
St as follows: 

A 1 ' = N x N 

Tile Tt = A 1 ' 

T>h = {(*» J') e Alt I = A}, for each ft G {1, . . . , k} 

Right 1 * = {((i,j), (i + \i,j e N} 
U P Tt ={((*, j), (M + l)) I ijGN} 
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It is immediate to verify that X t is a model of St and that gj* is true while q' Xt is 
false. 

"<^" Consider a model X of Sq- in which q is true and q' is false. Then X 
contains an instance oq of Tile and assertions (13) in St force the existence of 
arbitrary long chains of instances of Tile, beginning with Oq and connected one to 
the next by alternations of Right 1 and Up 1 . By assertions (12), Right and Up are 
functional for all instances of Tile, and since q' is false in X, these chains of objects 
form indeed a grid. By assertions (9) and (11), each such object is an instance of 
precisely one Dh- Hence, we can construct a tiling tx by assigning to each object 
o of the grid, representing an element of the first quadrant, a unique tile type Dh- 
Considering also assertions (14), it is easy to show by induction on the length of the 
chain from oo to an instance o of Tile, that the horizontal and vertical adjacency 
conditions for o are satisfied. Hence tx is a correct T-tiling. □ 

The theorem above immediately implies undecidability of containment wrt a 
schema of queries containing inequalities. 

Theorem 4.2. Let S be a schema, and q, q' two queries of the same arity that 
may contain atoms of the form t ^ t' . Then the query containment problem S \= 
qQq' is undecidable. 

The reduction used in the proof of Theorem 4.1 shows that query containment 
remains undecidable even in the restricted case where: 

— S does not contain assertions on relations, and all assertions on concepts are of 
the form AQC, 

— S, q, and q' do not contain regular expressions, 

— q and q' do not contain union, or constants expressions, and 

— there is a single inequality in q', and no inequality in q. 

Making use of a more involved proof, it is possible to show that the reduction 
used in Theorem 4.1 works also if one omits from St assertions (12) specifying 
functionality of Right and Up. In this case, a model X of St in which q n is true and 
<7o is false does no longer determine a unique grid, but it is nevertheless possible to 
extract from X a correct T-tiling. 

5. QUERY ANSWERING 

As we said in the introduction, it is well known in the database literature that 
there is a tight connection between the problems of conjunctive query containment 
and conjunctive query answering [Chandra and Merlin 1977]. Such a relationship 
has had a particular importance in settings of databases with incomplete informa- 
tion, such as those arising in information integration [Abiteboul and Duschka 1998; 
Lenzerini 2002] , semistructured data [Calvanese et al. 2002] , and Description Logics 
[Baader et al. 2003]. In this section we discuss query answering under Description 
Logics constraints, taking advantage of the results on query containment presented 
above. By query answering under Description Logics constraints we mean to com- 
pute the answers to a query over an incomplete database, i.e., a database that is 
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partially specified and must satisfy all Description Logic constraints expressed in a 
schema. 7 

Given a T>CR reg schema S, we specify an incomplete database V over S by means 
of a set of facts, called membership assertions, of the form 

C(a) R(a) 

where C and R are respectively a concept expression and a relation expression over 
S, a is a constant, and a is an tuple of constants of the same arity as R. Note that 
such a notion of incomplete database corresponds to that of ABox in Description 
Logics [Baader et al. 2003]. 

An interpretation X satisfies an assertion C(a) if a 1 G C 1 , and it satisfies an 
assertion R(a) if a 1 G R x . We say that I is a model of V, if it satisfies all 
assertions in V. An incomplete database T> is satisfiable with respect to a schema 
S if there is an interpretation X that is a model of both S and V. Intuitively, every 
such interpretation X represents a complete database that is coherent with both V, 
and the Description Logic constraints in S. 

Given a schema S, an incomplete database V over S, and a query q for S, the 
set of certain answers cert(q, S, T>) of q with respect to S and T> is the set of tuples 
c of constants in V that are answers to q for all complete databases coherent with 
V and S, i.e., such that c G q 1 , for all models X of S and V. 

Given a query 

g(x) <- conji (x, yi, ci) V • • • V cory m (x, y ro ,c TO ) 

in order to check whether a tuple c of constants is in cert(q, S, V), we can resort to 
query containment [Abiteboul and Duschka 1998]. In particular, let us define the 
boolean (i.e., of arity 0) queries Qv and Q q ^ as follows: 

Qv() <- A C (a)ev c ( a ) A AR( S )eP R ( g ) 

Q q ,e{) <- conjee, yi, ci) V • • • V conj m (c, y m , c m ) 

The first query <5r> is the conjunction of all facts in £>, while the second query Q^g 
is obtained from g by replacing each variable in x with the corresponding constant 
in c. 

Theorem 5.1. Let S be a schema, V an incomplete database over S, q a query 
for S, and c a tuple of constants in V of the same arity as q. Then c G cert(q, S, V) 
if and only if S \= Qt> Q Qq,c- 

Proof. The result can be proved exactly as in [Abiteboul and Duschka 1998] . □ 

From Theorem 3.8 we immediately obtain the following complexity result. 

Theorem 5.2. Let S be a schema, T> an incomplete database over S, q a query 
for S, and c a tuple of constants in V of the same arity as q. Then deciding whether 
c G cert{q,S,T>) can be done in time '2p(\ s \+\' D \+\<iY< i ) ; where \S\, \T>\, and \q\ are 
respectively the sizes of S, T>, and q, d is the number of constants in V and q, and 
I is the number of existentially quantified variables in q. 



'Note that, the case in which we have complete information on the database, the constraints do 
not play any role on query answering, assuming that the database is consistent with them. 
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Note that this means that, while query answering is double exponential in com- 
bined complexity, it is actually only single exponential in the number of constants 
in the database. It follows, that our technique is exponential in data complexity, 
i.e., the complexity measured only with respect to the size of V. 

Finally, it follows directly from the semantics, that satisfiability of a given in- 
complete database T> with respect to a schema S. can be rephrased as satisfiability 
of the query Qt> with respect to S. Thus, we obtain the following result. 

Corollary 5.3. Let S be a schema and V an incomplete database over S . Then 
deciding whether V is satisfiable with respect to S can be done in time 2 P ^ S ^ V ^ . 

In Description Logics jargon, this shows EXPTIME-completeness of TBox+ABox 
satisfiability in our setting. Observe that, since we allow for union of conjunctive 
queries on the left-hand side query in the containment, this result can be imme- 
diately extended to satisfiability of a TBox together with a disjunction of ABoxcs 
[Calvanese et al. 2001]. 

6. CONCLUSIONS 

In this paper we have introduced T>£lZ reg , an expressive language for specifying 
database schemas and non-recursive Datalog queries, and we have presented decid- 
ability (with complexity) and undecidability results of both the problem of checking 
query containment, and the problem of answering queries under the constraints ex- 
pressed in the schema. 

The query language considered in this paper allows no form of recursion, not 
even the transitive closure of binary relations. It is our aim in the future to extend 
our analysis to the case where queries may contain regular expressions, in the spirit 
of [Calvanese et al. 2000]. 
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